# ViT-GPT2 Architecture
Vit Gpt2 Image Captioning
Apache-2.0
This is an image captioning model based on ViT and GPT2 architectures, capable of generating natural language descriptions for input images.
Image-to-Text
V
aryan083
31
0
Rgb Language Cap
Apache-2.0
This is a vision-language model trained on the COCO dataset, capable of generating descriptive texts that include spatial relationships between image entities.
Image-to-Text
Transformers English

R
voxreality
24
0
Rgb Language Cap
MIT
This is a spatially-aware vision-language model capable of recognizing spatial relationships between objects in images and generating descriptive text.
Image-to-Text
Transformers English

R
sadassa17
15
0
Vit Gpt2 Verifycode Caption
Apache-2.0
A ViT-GPT2 architecture captcha recognition model fine-tuned on a dataset of 60,000 images, capable of accurately identifying text in captcha images.
Image-to-Text
Transformers

V
AIris-Channel
28
1
Image Caption Generator
A vision-language model trained on the Flickr8k dataset, capable of generating natural language descriptions for input images
Image-to-Text
Transformers

I
bipin
177
15
Featured Recommended AI Models